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Abstract 



We describe a statistical hypothesis test for the presence of a signal based on the 
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^ , likelihood ratio statistic. We derive the test for a special case of interest. We 
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' ^ , study extensions of the test to cases where there are multiple channels and to 
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tG , marked Poisson distributions. Wc show the results of a number of performance 

studies which indicate that the test works very well, even far out in the tails of 
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1. Introduction 

One of the main goals of the upcoming experiments at the Large Hadron 
Collider at CERN will be to make discoveries, for example of the Higgs boson. 
To do so it will be necessary to make use of all available information, that 
means we will need to use data from multiple channels as well as auxiliary 
measurements. In this paper we will describe a test capable of doing so, based 
on the likelihood ratio test statistic. The main contribution of this paper is the 
study of the performance of this test. 

Discoveries in high energy physics require a very small false-positive, that is 
the probability of falsely claiming a discovery has to be very small. This prob- 
ability, in statistics called the type I error probability a, is sometimes required 
to be as low as 2.87 • 10^^, equivalent to a 5a event. The likelihood ratio test 
is an approximate test, and what sample sizes are necessary for the approxima- 
tion to work, especially this far out in the tail, is a question that needed to be 
investigated. 

2. Likelihood Ratio Test 

The general problem of discovery is as follows: we have data X from a 
distribution with density /(x; 9) where is a vector of parameters with G O 
and O is the entire parameter space. We wish to test the null hypothesis Hq : 
E Oq (no signal) vs the alternative hypothesis. Ha : G 6g (some signal), 
where Oq is some subset of O. The likelihood function is given by 

Like{e\yi) = /(x; 9) 



and the likelihood ratio test statistic is defined by 

x/ N supq Like{e\x) 

a(x) = — - — — — - 

supq Like[0\x) 

Because Like{9\x) > and because the supremum in the numerator is taken 
over a subset of the supremum in the denominator we have < A(x) < 1. The 
likelihood ratio test rejects the null hypothesis if A(x) < c, for some suitably 
chosen c, which in turn depends on the type I error probability a. 

How do we find c? There is of course a famous theorem that states that 
under some mild regularity conditions, if G Gq then L(x) = — 21ogA(x) has a 
chi-square distribution as the sample size n —^ oo. The degrees of freedom of the 
chi-square distribution is the difference between the number of free parameters 
specified by e Oo and the number of free parameters specified hy 6 E 9. 

A proof of this theorem is given in Stuart, Ord and Arnold [1 1 and a nice dis- 
cussion with examples can be found in Casella and Berger 3| ■ Unfortunately the 
theorem docs not apply to our case, nevertheless as we shall see the conclusion 
does. 

3. An Example: A Counting Experiment with Background, Efficiency 
and Acceptance 

We begin with a very common type of situation in high energy physics exper- 
iments. This is a search for a particle by observing a particular decay channel. 
After suitably chosen cuts we find n events in the signal region, some of which 
may be signal events. We can model n as a random variable N with a Poisson 



distribution with rate res + b where 6 is the background rate, s the signal rate 
for the production of the particle, e the efficiency for observing the particular 
decay channel and r the branching fraction to that channel. We also have an 
independent measurement y of the background rate, either from data sidebands 
or from Monte Carlo and we can model y as a Gaussian random variable Y with 
rate b and standard deviation ah- Finally we have an independent measurement 
of the efficiency 2;, usually from Monte Carlo, and we will model z as a Gaussian 
random variable Z with mean e and standard deviation cje- crt, ere as well as 
the branching fraction r are assumed to be known. So we have the following 
probability model: 

N ^ Pois{res + b) Y ^ N{b,(7b) Z ^ N{e,ae) 

In this model s is the parameter of interest and e and b are nuisance parameters. 
Now the joint density of A'^, y and Z is given by 

P{N = n,Y = y, Z = z)dydz = f{n, y, z; e, s, b) = 

(res+b) ^-{res+b) 1 2 „; i -3 „j 

Finding the likelihood ratio test statistic A means maximizing the density 
above (now viewed as the likelihood) twice, once over all parameters and then 
again assuming s — Q. We find 

2n log(n/6) + 2b - 2n + -^^^^^ 

where b ^ Uy - al + y/ iy - cr't)^ + -inaf 



First we note that the test statistic does not involve z, the estimate of the 
efficiency, nor does it involve r, the branching fraction. This is true for the one 
channel case but will no longer hold for multiple channels, although we will find 
that the test is sensitive only to the relative efficiencies and relative branching 
ratios between channels, quantities which are usually known more precisely than 
the absolute values. 

Now from the general theory we know that L{N, Y, Z) has a chi-square dis- 
tribution with 1 degree of freedom because in the general model there are 3 free 
parameters and under the null hypothesis there are 2. 

Large values of i(n, y, z) indicate that the null hypothesis is wrong and 
should be rejected. Such large values happen if n is much larger than y but 
also if n is much smaller. Here, though, we will only reject the null hypothesis 
if we have more events in the signal region than are expected from background, 
and therefore we reject the null hypothesis if L{n, y, z) > qxii^ ~ 2q:) and also 
n > y. Here qxiip) is the p* percentile of a chi-square distribution with one 
degree of freedom. 

A similar problem, where the background is modeled as a Poisson rather than 
a Gaussian, is discussed in much more detail in Rolke, Lopez 3|. The closely 
related problem of setting limits was studied in Rolke, Lopez and Conrad 4| 



4. Extensions of the Model 

4.I. Multiple Channels 

In high energy physics we can sometimes make use of muhiple channels. 
We will discuss the following model: there are k channels and we have iV^ ~ 
Pois{rieiS + bi), Yi ^ N{bi,abi), Zi ^ N{ei,aei), i = l,..,fc, all independent. 
The joint density is then found as follows: Let n ~ (rii, ..,n/c), y = (yi, ..,2/fc), 
z = (zi, .., Zfe), b = (61, .., 5fe), e = (ei, ..,ek), then 

/(n,y,z;s,b,e) = 

A (r.e.s+bO"» f^,e.^+b.) 1 Q^r,[_l(m:^\ I Q^r, ( - I i£l^^^] 

The log-likelihood function is given by: 

log Like{s, b, e; n, y, z) = 

fe 
J2 [ni log (riCiS + 6j) - log(ni!) - (ne^s + bi)- 

i log(2^a2j + ^M_p)l 1 log(27r<) + ^^] 
and taking derivatives we find the following system of equations for the 
maximum likelihood estimators: 



fc , 

El JiirjCj 
\ rieiS-\-bi 



- nci ^ 



rieiS-\-bi erf' ' ' 



ns + ^^ = 1 = 1, ..,k 



This system can not be solved analytically but it is fairly easy to do so nu- 
merically, for example with MINUIT. In addition, it can be shown analytically 
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that the likelihood ratio test statistic depends not on the absolute values of the 
efficiencies and branching fractions but only on the ratios between the values 
for the different channels. 

For the numerator of the likelihood ratio statistic we have s — and the 
corresponding system has the solutions 

h^ ^ [Vz - al^ + ^{Vi - (jI.P + 4:n^al^j i = l,..,k 

Ci — Z{ t — 1 , . . , /C 

As above we will claim a discovery only if there is an excess of events in the 
signal region. If we denote the test statistic by _L(n, y,z) this means to reject 
the null hypothesis of no signal if i(n, y, z) > qxii^ — 2q:) and also s > where 
s is the maximum likelihood estimator of the true signal s. 

4-. 2. Extension II: Marked Poisson 

It is sometimes possible to include further information in this model. Con- 
sider the following case: in the i*^ channel we observe Ui events in the signal re- 
gion and j/i events in the background region. We have an independent measure- 
ment Zi of the efficiency. Furthermore we have measurements Xy, j = 1, ..,ni 
for each event in the signal region and we know the distributions of these mea- 
surements depending on whether an event is signal or background. This leads 



to the following density function: 

log Like{s, b, e; n, y, z) = 

k 

J2 [rii log {ncis + bi) - log(nj!) - [r^eiS + bi)~ 

i=l 

\ log(27ra2j + (^ - i log(2w2j + (s_^ + 

where // and ff are the densities of the signal and the background in the i*'' 
channel, respectively. In this paper we will assume that // and f^ are fully 
known but it would be easy to let them depend on nuisance parameters as 
well. In some applications these densities might be estimated from the data, for 
example using neural networks. Furthermore, this model allows for a "mixture" 
case: if in some channels no measurements Xij are available we only need to set 
// and ff equal to 1. 

The expression above simplifies somewhat if we set /^ = /'(x ) ^^^ omit 
any constant terms: 

log Like{s, b, e; n, y, z) = 

{r,e,s + bi) - \ ''^'^i'^ - ^ ^"'Vg"''' + Sj=i l°g (^«^i* + ^i/y ) 
Finding the maximum likelihood estimators now means solving the following 






nonlinear system of 2fc + 1 equations: 



«— 1 ^ ^ 

^^-1 + E?'i ^V^ = i = l,..,k 



^S^ - 7•^S + E?' 1 ^T-TT^O i = l,..,fc 

Again this system can not be solved analytically. For the numerator of the 
likelihood ratio statistic we have s — Q and the corresponding system has the 



same solutions as the corresponding system in section 4.1. The test is then 
again: reject the nuU hypothesis of no signal if L(n, y, z, x) > gxf(l — 2a) and 
s > 0. 

5. Performance 

How do the above tests perform? In order to be a proper test they first of 
all have to achieve the nominal type I error probability a. If they do, we can 
then further study their performance by considering their power function /3(s) 
given by 

/3(s) = P(reject iJo| true signal rate is s) 

Of course, we have /?(0) — a. (3{s) gives us the discovery potential, that is the 
probability of correctly claiming a discovery if the true signal rate is s > 0. 

Performance studies for the case of one channel were previously done in 
Rolke-Lopez 3|. 

In high energy physics discoveries usually require a very small type I error 
probability, often as small as a = 2.87-10"^, equivalent to a 5cr event. A straight- 
forward simulation study would therefore need to do about 10^ runs. Instead of 
a simple MC study we will use a technique called importance sampling to esti- 
mate the true type I error probability. It works as follows. In a straightforward 
MC study we would generate Ni ^ Pois{bi), Yi ^ N{bi,ab-), Zi ^ N{ei,aei), 
Xij ^ Fj^, where Fj' is the distribution of the background events in channel i, 
with i — I, .., k, j = 1, .., Ni. Then we would calculate L(N,Y, Z,X) and 
find the percentage of runs where L(N, Y, Z,X) > gxi(l ^ ^a) and s" > 0. 



At 5(7 though, this wiU only happen about 1 in every 3.5 miUion runs. So 
instead we will generate the MC data as if the true observed signal rate in 
every channel were t, that is we generate Nf ~ Pois{t), N^ ^ Pois{bi), 
Y, ^ N{h,at,,), Z, ^ 7V(e„ae.), X,, ^ Ff for j = l,..,iVf and X,, ^ Ff 
for j — N^ + l,..,iVf + N^ {— Ni), respectively. For a suitably chosen t, 
L(N,Y, Z,X) will be of the order of the critical value reasonably often. We 
generate M MC samples and find the true type I error as 

M 

M 



1 ^' 
^=mY.^ [^(N' Y> Z, X) > qx\{l ~ 2a), s > 0] 



where the weights w,„ are given by the likelihood ratio of the true density and 
the one used for the sampling: 

- n i^T^T^^^nrr = e n' 



For more on importance sampling see Srinivasan 5|. 

In figure 1 we have the result of the following study: we use 5 channels, 
the background rates b vary from 2 to 100 and are the same in all channels, 
ab = 6/15, e = 0.9, Ue = 0.1, r^ ~ 0.15 in all channels. As we can see the test 
achieves the true type I error for all cases. 

Next we will consider what happens when the number of channels grows. In 
figure 2 we have fc = 1 to 50, in all channels b — 25 with c;, = 5/3, e = 0.9 with 
(Te — 0.09 and r = l/k. Again we achieve the nominal type I error probability, 
even for 50 channels and at 5a. 

In figure 3 we consider the power of the test. There are 10 channels, each 
with a background rate b — 50, ab — 5, efficiency e = 0.9, ae = 0.09 and 
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branching ratio r — 0.05. At the 5ct level the total signal rate has to be about 
410 to have a 90% chance of claiming a discovery. 

Now we turn to a study of the marked Poisson case. We will consider two 
examples. In the first we have some auxiliary measurement thought to be able 
to separate signal and background. The functions // and f^ have an (assumed 
to be known) parameter 7 and are given by 

r(x) = le?, l<a;<2 

/^(a;)^ie^, l<a;<2 
For small values of 7 there is a large distinction between signal and background, 

for larger values the separation becomes smaller. Two cases are shown in figure 

4 in the top two panels with two different values of 7 corresponding to strong 
and almost no separation. /* is drawn in dashed lines and f^ in solid lines. 

In a different example we use the mass distributions themselves. We assume 
a flat background and a Gaussian signal with mean 0.5 and standard deviation 
S. Again two cases of different separation between signal and background are 
shown in figure 4, in the bottom panels. // is drawn in dashed lines and f^ in 
solid lines. 

We begin as before with a study of the true type I error probability a. In 
figure 5 we have 5 channels. Each channel has a background rate b from 5 to 
50, (jfc ~ 6/5, an efficiency of e = 0.9, o-g = 0.1 and r = 0.15. The + symbols 
are for example 1, 7 = 4, x for example 1, 7 = 0.33, diamonds for example 2, 

5 = 0.25 and upside down diamonds for example 2, S = 0.05. For all those cases 
the method achieves the true type I error probability a. 
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Finally, in Figure 6 we present a power study of those same marked Poisson 
examples. The signal rate goes from to 200. For example 1 we use 7 = 2.0, 1.0 
and 0.5, for example 2 S — 0.25, 0.15 and 0.05, going from a large separation 
between signal and background by /'' and /* to almost no separation. Figure 6 
clearly shows how much improvement is possible by using the extra information 
contained in /'' and /^. 

The studies here have used reasonable values for the parameters involved. 
For example, when using multiple channels, it is reasonable to use channels 
for which the product of efficiency times branching fraction is similar. In our 
studies these have been set equal. However, an exhaustive performance study is 
not possible because of the high dimensionality of the problem. Nevertheless, we 
believe that the uniformly excellent performance in studied cases is an indication 
that this test will perform very well in a wide range of cases. In general, though, 
we would recommend the practitioner to carry out their own simulation study 
for their specific problem to insure that the method also performs well there. 

6. Summary 

We have discussed a hypothesis test for the presence of a signal. We ex- 
tended the test to the case of multiple channels as well as the use of auxiliary 
measurements using marked Poisson models. Studies of the performance of the 
test for typical cases yielded highly satisfactory results. 
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Figure 1: Study of type I error a for case of 5 channels. The background rates b vary from 
2 to 100 and arc the same in all channels. Cj, = 6/15, e = 0.9, ere = 0.1, r^ = 0.15 in all 
channels. 
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Figure 2: Study of the effect of the number of channels on type I error. There are k channels 
(fe = 1 to 50). In all channels b = 25 with ctj, = 5/3, e = 0.9 with (Je = 0.09 and r = 1/fc. 
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Figure 3: Study of the power of the test. There are 10 channels, each with a backgound rate 
b = 50, CTj, = 5, efficiency e = 0.9, CTe = 0.09 and branching fraction r = 0.05. 
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Figure 4: Examples for the two types of distributions used in the study of the marked Poisson 
case. In the upper two panels we have an auxiliary measurement for the events, in the lower 
two panels the actual mass distributions are used. 
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Figure 5: Study of type I error using mulitple channels and marked Poisson. There are 5 
channels. Each channel has a background rate b from 5 to 50, o"[, = b/5, an efficiency of 
e = 0.9, (Te = 0.1 and r = 0.15. The + symbols are for example 1, 7 = 4.0, x for example 
1, 7 = 0.33, diamonds for example 2, <5 = 0.25 and upside down diamonds for example 2, 
5 = 0.05. 
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Figure 6: Power study for case of multiple channels and marked Poisson. We use 5 channels, 
each channel has a background rate b = 50, cri, = 5, an efficiency of e = 0.9, (Je = 0.1 and 
r = 0.15. The signal rate goes from to 200. For example 1 we use 7 = 2, 1, and 0.5, 
for example 2 <5 = 0.25, 0.15 and 0.05, going from a weak separation between signal and 
backgound by f^ and /^ to a strong separation. 
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